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Abstract The statistics of natural catastrophes contains very counter-intuitive re- 
sults. Using earthquakes as a working example, we show that the energy radiated 
by such events follows a power-law or Pareto distribution. This means, in theory, 
that the expected value of the energy does not exist (is infinite), and in practice, that 
the mean of a finite set of data in not representative of the full population. Also, the 
distribution presents scale invariance, which implies that it is not possible to define 
a characteristic scale for the energy. A simple model to account for this peculiar 
statistics is a branching process: the activation or slip of a fault segment can trigger 
other segments to slip, with a certain probability, and so on. Although not recog- 
nized initially by seismologists, this is a particular case of the stochastic process 
studied by Galton and Watson one hundred years in advance, in order to model the 
extinction of (prominent) families. Using the formalism of probability generating 
functions we will be able to derive, in an accessible way, the main properties of 
these models. Remarkably, a power-law distribution of energies is only recovered 
in a very special case, when the branching process is at the onset of attenuation and 
intensification, i.e., at criticality. In order to account for this fact, we introduce the 
self-organized critical models, in which, by means of some feedback mechanism, 
the critical state becomes an attractor in the evolution of such systems. Analogies 
with statistical physics are drawn. The bulk of the material presented here is self- 
contained, as only elementary probability and mathematics are needed to start to 
read. 
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2 Alvaro Corral and Francesc Font-Clos 

1 The Statistics of Natural Hazards 



Only fools, charlatans and liars predict earthquakes 
C. R Richter 



Men, and women, have always been threatened by the dangers of Earth: volcanic 
eruptions, tsunamis, earthquakes, hurricanes, floods, etc. Sadly, still in the 21st cen- 
tury our societies have not been able to get rid of such a sword of Damocles. But 
are natural catastrophes submitted to the caprices of the gods? Or do these disasters 
contain some hidden patterns or regularities? The first view has been dominant for 
many centuries in the history of humankind, and it has been only in recent times that 
a more rational perspective has started to consolidate. 



1.1 The Gutenberg-Richter law 

One of the first laws quantifying the occurrence of a natural hazard was proposed 
for earthquakes by the famous seismologists Beno Gutenberg and Charles F. Richter 
in the 1940's, taking advantage from the recent development of the first magnitude 
scale by Richter himself. The Gutenberg-Richter law is quite simple: if one counts 
the number of earthquakes in any seismically active region of the world during a 
long enough period of time, one must find that for each 100 earthquakes of magni- 
tude M greater or equal than 3 there are, approximately (on average), 10 earthquakes 
with M > 4, one earthquake with M > 5, and so on ( [Gutenberg and Richter[|1944{ 
Utsu[|1999t|Kanamori and Brodsky||2004| ). So, the vast majority of events are the 



smallest ones, and, fortunately, only very few of them can become catastrophic, 
maintaining a constant proportion between their number. 

It is not possible to measure all earthquakes on our planet, but for some areas 
with very accurate seismic monitoring it has been found that the Gutenberg-Richter 
law holds down to magnitude minus 4 ( [Kwiatek et al. 2010); this corresponds to 



small rock cracks of a few centimeters in length (negative magnitudes are introduced 
to account for the fact that there can be earthquakes smaller than those of zero 
magnitude). An d, more remarkably, for nanofracture experiments in the laboratory 



( Astrom et al.| [2006), the law has been verified up to magnitude below -13. The 



scarcity of the big events contained in the law leaves as open the question about 
which is its upper limit of validity. 

Despite not being recognized or mentioned by Gutenberg and Ritchter in their 
original paper (1944), any reader with a minimum knowledge of probability and 
statistics will immediately realize that the Gutenberg-Richter law implies an expo- 
nential distribution of the magnitudes of earthquakes, i.e., 

Dm{M) c< 10-''^, (1) 
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(a) Beno Gutenberg 



(b) Charles Richter 



Fig. 1: Seismologists Beno Gutenberg and Charles F. Richter (photos from 
seismo . berkely . edu). 



with Dm{M) the probability density of M, the parameter b taking a value close to 1, 
and the symbol oc standing for proportionality (with the constant of proportionality 
ensuring proper normalization). 

But which is the meaning of the Gutenberg-Richter law, in addition to provide 
an easy-to-remember relationship between the relative abundances of earthquakes? 
The interpretation depends, of course, on the meaning of magnitude, which we have 
avoided to define. In fact, there is no a unique magnitude, but several of them, sec- 
ond, magnitudes do not have physical dimensions (i.e., units), and third, "magni- 
tudes reflect radiation only from subportions of the rupture, and they saturate above 
certain size, rather than giving a physical characterization of the entire earthquake 
source" ( Ben-Zion] |2QQ8| ). More in-depth understanding comes from the energy ra- 
diated by an earthquake, which is believed to be an exponential function of its mag- 
nitude ^Kanamori and Brodsky[|2QQ4] ), that is. 



(2) 



with a proportionality factor close to 60 kJ (Utsu 1999 ); so, an increase by 1 in the 
magnitude implies an increase in energy by a factor VlOOO 32. Thus, an earth- 
quake of magnitude 9 radiates as much energy as 1000 earthquakes of magnitude 7, 
or as 10^ of magnitude 5. 
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One can reformulate then the Gutenberg-Richter law in terms of the energy. In- 
deed, the probability of an event is "independent" of the variable we use to describe 
it, and so, 

De{E)=Dm{M)'^, (3) 

with De (E) the probability density of the energy. Using equation ([2]), we can express 
M as a function of E, 

Moclog^, (4) 

and differentiate to obtain dM/dE, 

dM 1 
dE E 

so that equation ^ reads: 

D,(£)ocl0-^-i = (l0¥)--i=£-fl. (6) 

Summarizing, this straightforward change of variables leads to 

1 2b 
^e{E) - with a = 1 + y , (7) 

and this is just the so-called power-law distribution, or Pareto distribution fNewmai^ 



anl 





2005 ), with exponent a around 1.67 when b is close to 1. Notice from equation 
that in order that Z)^ (£") is a proper probability density function, it has to be defined 
above a minimum energy Emin > , otherwise (if Efnin = 0), it cannot be normalized. 
Although the true value of Emm cannot be measured (it is too small), this parameter 
is not important as it does not influence any properties of earthquakes. 

Figure [2] displays the probability density of the seismic moment for worldwide 
shallow earthquakes ( |Kagan[[2010) ; this variable is assumed to be proportional to 



the energy, but much easier to measure accurately ( [Kanamori and Brodsky[|2004] ), 
and so, it should also be power-law distributed, with the same exponent. The straight 
line in the plot is the defining characteristic of a power law in double logarithmic 
scale, as logZ)^ (E) =C—a logE. A fit by maximum likelihood estimation ( [Clauset 
etaL] [20091 Nters et al.| [2QT0 ) yields a 1.68. 



Two important properties of power-law distributions are scale invariance (with 
some limitations due to the normalization condition) and divergence of the mean 
value (if the exponent a is below or equal to 2). These are explained in the Ap- 
pendix. 

To conclude this subsection, let us mention that the power-law distribution of 
sizes is not a unique characteristic of earthquakes. It has been claimed that many 
other natural hazards are also power-law distributed, although with different ex- 



ponents (and maybe with a lower or an upper cutoff): tsunamis (Burroughs and 



Tebbens ,"2005), landslides, rockfalls fMalamudV 2004| ), volcanic eruptions ( McClel 



land et al.,.1989,,Lahaie and Grasso,.1998j , hurricanes ( [Corral et aLl|2010| ), rainfaU 
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seismic moment (Nm) 



Fig. 2: Estimation of the probability density of seismic moment for worldwide shal- 
low earthquakes (in log-log scale), using the so-called CMT catalog ( Kagan 2010| ). 
A power law fit results in an exponent a = 1.68. Radiated energy should give the 
same power law behavior. Deviations at small values of the seismic moment are 
attributed to the incompleteness of the catalog. 



( [Peters et"aLl |2Q1Q| ), auroras ( [Freeman and Watkins[ [2002'), forest fires ( [Malamudj 



et al. 



2005[ )... As the reader will figure out, some of the facts that we will explain 



having in mind earthquakes can also be applied to some of these natural hazards, but 
maybe not to all of them. It is an open question to distinguish between these different 
cases. For an account of power-law distributions in other areas beyond geoscience 
see the excellent review by [Newm^ ( [2005| ) . 



1.2 A first model for earthquake occurrence 



As far as we know, a first attempt to develop an earthquake model in order to explain 
the Gutenberg-Richter law was undertaken by Michio Otsuka in the early 1970's 
( Otsuka 1971 1972[ Kanamori and Mori 2000[ ). He used as a metaphor the popular 
Chinese game of go, although we will formulate the model in relation to the game 
of domino, probably more familiar to the potential readers. 

Instead of playing domino, we are going to play a different game with their 
pieces. The idea is to make the domino pieces to topple, as in the well-known 
contests and attempts to break a Guinness world record, but with two important 
differences. First, the pieces are not put in a row, but, rather, they constitute a kind 
of tree. Second, when one piece topples, one does not know what will happen next, 
i.e., if some other pieces will topple in turn (and how many will) or not. So, we have 
a stochastic cascade process that supposedly mimics the rupture that takes place 
in a seismic fault during an earthquake. The tree of domino pieces constitutes the 
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fault, and each piece is a small fault patch, or element. The earthquake is the chain 
reaction of toppling of pieces (i.e., failures of patches). 




Fig. 3: Scheme of Otsuka's model for earthquake ruptures. White circles correspond 
to the propagation of the rupture, whereas black ones indicate termination points 



(Otsuka 1972). 



Getting more concrete, Otsuka assumed that the tree representing the fault had 
a fixed number of branches at each position, or node, and that the toppling would 
propagate from each branch to the next element with a fixed probability p, indepen- 
dently of any other variable. So, the number of propagating branches resulting from 
a single one would follow the binomial distribution ( [Ross 2Q02| ). For instance, in 
Fig. [3j the possible number of branches per element is just 2. If a fixed elementary 
energy is associated to the failure of each patch, one can obtain the energy released 
in this process from the number of topplings, allowing the comparison with the 
Gutenberg-Richter law, see nevertheless Sec. 4.1 of the review by |Ben-Zion] ( [2008l ). 
So, the propagation of ruptures is considered a probability controlled phenomenon, 
in such a way that when an earthquake starts, it is not possible to know how big it 
will become. Later, we will see that this statement is stronger than what it looks like 
here. The usual domino effect, in which one toppling induces a new one for sure and 
so on, would correspond to the controversial concept of a characteristic earthquake 
( Stein[ [2002 ; Be n-Zion| |2008i Ka gan et al. 2012), an event that always propagates 
along the complete fault or fault system and would release always the same amount 
of energy. 

The novel and original model in geophysics explained in this subsection, pro- 
posed by Otsuka in the 1970's, was already known by a few mathematicians 100 
years in advance. It will take us the next pages to explain the distribution of energy 
in this model. 



2 Branching Processes 



Besides gambling, many probabilists have been interested in reproduction 
G. Grimmett and D. Stirzaker 
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Let us move to the Victorian (19th century) England. There, Sir Francis Galton, 
the polymath father of the statistical tools of correlation and regression, and cousin 
of Charles Darwin, was dedicated to many different affairs. In addition to the height 
of sons in relation to the heights of their fathers, he was concerned about the decay 
and even extinction of families that were important in the past, and about whether 
this decline was a consequence of a diminution in fertility provoked by the rise in 
comfort. If that were the case, population would be constantly fed by the contribu- 
tion of the lower classes ( [Watson and Galton] |1875| ). In order to better understand 
the problem, he devised a null model in which the number of sons of each men was 
random (the abundance of women was not considered to be a limitation). Despite 
the apparent simplicity of the model, Galton was not able to solve it, and made a 
public call for help. The call was also fruitless, and then Galton turned to the math- 
ematician and reverend Henry William Watson. 



Fig. 4: The fathers of the Galton-Watson process (photos from Wikipedia and 
www . wolf ramalpha . com, respectively). 



2.1 Definition of the Galton-Watson process 

Let us consider "elements" that can generate other elements and so on. These ele- 
ments may represent British aristocratic men that have some male descendants, (or, 
in a more fresh perspective, women from anywhere that give birth to her daugh- 




(a) Sir Francis Galton 



(b) Rev. Henry William Watson 
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Fig. 5: A realization of the Galton-Watson process. At the top, the tree associated 
to the process is shown, starting from the left (Zq = 1). At the bottom, the evolution 
of the number of elements originated in each generation t are displayed. The model 
for P{K = k) is binomial with n = 2 and = 1/2, corresponding to the critical case 
(see main text). 



ters, or, perhaps more properly, bacteria that replicate), neutrons that release more 
neutrons in a nuclear chain reaction, or fault patches that slip during an earthquake. 
The Galton-Watson process assumes that each of these elements triggers a random 
number K of offspring elements in such a way that each K is independent from 
that of the other elements and all K are identically distributed, with probabilities 



P{K = 0) = po, P{K = 1) = pu ...P{K = k) = pk, with = 0,l,...oo piarris 
|1963| ). (Naturally, the normalization condition imposes YjikPk = !•) 

The model starts with one single element, in what we call the zeroth generation of 
the process, as shown in Fig.|5] The K offsprings of this first element constitute the 
first generation. Let Zq = 1 denote the number of elements of the zeroth generation, 
Zi the number of elements of the first generation, etc. Obviously, by construction, 
P{Z\ = k) = pk. The number of elements in the ^ + 1 generation is obtained from 
the number of the previous generation t as 

Zt 

Z,^i = '£Ki, (8) 

i=l 
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with ^ > 0, where Ki corresponds to the number of offsprings of each element in the 
t generation. Equation ([S]) can be used to simulate the process in a straightforward 
way and will be very important to its analytical treatment, in order to calculate 
the probability distribution of Z^, for any t. Some readers may recognize that the 
variables Zo,Zi , . . . form a Markov chain, but this is not relevant for our purposes. 
And of course, Otsuka's earthquake model is a particular case of the Galton- Watson 
process corresponding to a binomial distribution for P{K = k). 



2.2 Generating functions 

An extremely convenient mathematical tool will be the probability generating func- 
tion ( [Grimmett and Stirzaker 2001 ). For the random variable K this is, by definition. 



k=Q 

where the brackets indicate expected value. The normalization condition guarantees 
that fxi^) is always defined at least in the x— interval [—1,1], although only the 
interval [0, 1] will be of interest for us. Of course, the same definition applies to any 
other random variable; in the concrete case of K (which represents the number of 
offsprings of any element) we may drop the subindex, i.e., //^(x) = f{x). 
Very useful and straightforward properties will be, 

1. /^(0)=P(/r = 0); 

2. /^(l) = 1 (by normalization); 

3. f'K{^)=i:^kPkk={K)=m- 

4. /^(x) > for X > (non-decreasing function); 

5. /^(x) > for X > (non-convex function, "looking from above"); 

the primes denoting derivatives (left-hand derivatives at x = 1). Note that although 
we illustrate these properties with the variable they are valid for the generating 
function of any other (discrete) random variable. So, the plot of a probability gen- 
erating function between and 1 is very constrained. We anticipate that two main 
cases will exist, depending on whether the expected value of ^ is m < 1 or whether 
m> I. This is natural, as the first case corresponds to a population that on average 
decreases from one generation to the next whereas in the second case the population 
grows, on average. 

Another property but not so straightforward is that the generating function of a 
sum of N independent identically distributed variables K (with N fixed) is the N-\h 
power of the generating function of K\ that is, if 

N 

^ = Y.^u (10) 

i=\ 



then 
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fE{x)=Mxf. (11) 

Indeed, 

Mx) = (x^) = {x^"') = (x^i •x^2...^iv) = (x^.)(x^2)...(^^) (12) 

where we can factorize the expected values due to statistical independence among 
the^/'s. 

In general, if the random variables Ki were not identically distributed (but still 
independent), the generating function of their sum would be the product of their 
generating functions. The demonstration is essentially the same as before, and one 
only needs to introduce new notation for the different generating functions. 

A following step is to consider that N is also a random variable, with generating 
function /iv(x). Then, 

Mx)=fN{fK{x)). (13) 

Note that equation (T3\ is just a generalization of equation (TT\ , i.e., now we calcu- 
late the expected value of the powers of /^(x) depending on the values that N make 
take. In any case, it is easy to demonstrate: denoting with (•)^. the average over the 
^;'s and with the average over N, we have 

Mx) = (x^) = {{x'^)kX = (/^W)^ = MMx)), (14) 

where the last equality is just the definition of the probability generating function 
of the random variable A^, evaluated at /^^(x). We stress that this is only valid for 
independent random variables. 



2.3 Distribution of number of elements per generation 

Going back to the Gallon- Watson branching process, where we know that Z^+i = 
we can identify Z^+i as E and Zt as N; then equation ([13]) reads, 

fz,^,{x) = fzAMx)) = fz,{f{x)) (15) 

(dropping the subindex K). As fz^ {x) = f{x), it is straightforward to see by induc- 
tion that the generating function of Z^, is given by 

/z,(x)=/(/(.../(x)))=/(x), (16) 

where the superindex t denotes composition t times. This is valid for ^ = 1,2, ... ; 
for r = we have, obviously, that fzo{x) = x (because Zq = 1 with probability 1). 
In words, the generating function of the number of elements for each generation 
is obtained by the successive compositions of f{x). This non-trivial result was first 



proved by Watson in 1874 (Harris 1963 ). 
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Here we present an illuminating result, which will be useful at some point in the 
chapter. Although, in general, the successive compositions of the generation func- 
tion leads to very complicated mathematical expressions, the moments of can be 
computed in a simple way ( [Harris [ |1963|. Using what we have learnt about gener- 
ating functions together wtih equation (|16|), the expected value of Zt is 

(17) 

x=l 

Let us then write 

J^/W = |^/(/-W) = /(/-W)|^/-W, (18) 

therefore, by induction, 

y\x)=f\f-\x))f\f-\x))---f\f{x))f\f{x))f{x). (19) 

Taking x=\ and using that all the generating functions have to be 1 at that point, 

(Z,)=/(iy=m'. (20) 

So, when m < 1 the mean number of elements per generation decreases exponen- 
tially, whereas when m > 1 this number increases, constituting a stochastic real- 
ization of Malthusian growth. For this reason m is sometimes called the branching 
ratio. When m = 1 the average size of the population is constant, but we will later 
see that this does not mean that the population reaches a stable state. Higher-order 
moments can be computed in a similar way, but they are not so useful as the mean. 

Another related issue is the one of the expected value of the number of el- 
ements per generation conditioned to the value of the previous generation, i.e., 
(Z^+i \Zt = Zt)- As when Zt is fixed, Z^+i = Y4L1 t^^^, taking the expected value, 

{Zt^,\Zt=Zt) = f^{Ki)=Ztm. (21) 

i=l 

This result can be used to relate branching processes with martingales ( [Grimmett 
[and Stirzaker[|200l"] ), but this does not have to bother us. 



2.5 The probability of extinction 



Extinction of the process is achieved when Zt = 0, for the first "time" (i.e., for the 
generation that yields Zt = for the first t). Then, all the subsequent Z's are also 
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zero, and extinction can be considered an "absorbing state", in this sense. We now 
see that the probability of extinction in the Gallon- Watson process is equal to one 
(extinction for sure) for m < 1 and is smaller than one for m > 1 . 

This result, which may be referred to as the Gallon- Watson-Haldane-Steffensen 
(criticality) theorem, was first proved by J. F. Steffensen, in the 1930's (being un- 



aware of the work by Gallon and Watson, and later progress by Haldane). As |Kendall 
1966| ) pointed out, after then, the same theorem "was to be re-discovered over and 



over again, especially during the (Second World) War period, and no doubt we have 
not yet seen its last re-discovery". Ironically, Kendall did not know that Irenee- Jules 
Bienayme knew the theorem, in its correct formulation, 30 years in advance Gallon 



and Watson and 85 years before Steffensen ( [Kendall||1975| )! 

Indeed, extinction may happen at the first generation, Z\ = 0, or at the second, 
Z2 = 0, etc. All these extinction events are included in Zt = 0, with ^ ^ 00; therefore, 
the probability of extinction Pext is given by 

Pext limP(Zi = or Z2 = or . . . or Z^ = 0) = limP(Z^ = 0) = lim/(0), (22) 

i.e., by the infinite iteration of the point x = through the generating function f{x) 
(using the key property that the probability of a zero value is the value of the gener- 
ating function at zero, and equation ([16]) again). 




0.2 0.4 0.6 0, 




0.2 0.4 0.6 0, 



Fig. 6: Probability generating function f{x) of the number of offsprings per element 
and iteration of the point x = through successive compositions of /. The fixed 
points correspond to the crossings of the diagonal; those closer to zero are also 
the attractors for the iteration. Left corresponds to a subcritical case and right to a 
supercritical case. The model is the binomial one, with n = 2. 



We now calculate the iteration f{0). In the interval [0, 1] the function f{x) is 
non-decreasing and non-convex, taking values from ;?o to 1. If the slope of f{x) 
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at X = 1, given by m = (K) = /'(I), is smaller than or equal to 1, then f{x) only 
crosses (or reaches) the diagonal at x = 1 (otherwise, f{x) would need to be convex 
somewhere), and the iteration of the point x = ends at the point x = I (which is 
the attractor, see Fig. [6]). Therefore, 

P,,, = lim/(0) = 1, (23) 

i.e., extinction is unavoidable if m < 1. There is a trivial exception, though, associ- 
ated to /7i = 1 (and zero for the rest); this is an extremely boring situation indeed. 
In this case, f{x) = x, and therefore lim/^(0) = 0, which means, obviously, that the 
probability of extinction is zero. 

If the slope of f{x) at x = 1 is m> 1 (which only can happen for a non-linear 
generating function, po-\-p\ < 1), then f{x) has to cross the diagonal at a point x* 
smaller than one, which is the attractive solution to which the iteration tends, see 
Fig. [6] again. In mathematical language, 

Pe,t = limf{0)=x\ (24) 

where 

X* =/(x*) withx* < 1. (25) 

The demonstration is elaborated in the Appendix. 
Summarizing, 

Pe. = i\'^,"'-] (26) 

^""^ I X* if m > 1 ^ ^ 

with X* < 1, except in the trivial case pi = I, which has m = 1 but yields Pext = 0. 

Equation ( |26| ) clearly shows that, in general, the point m = 1 separates two dis- 
tinct behaviors: extinction for sure for m < 1 and the possibility of non-extinction 
(non-sure extinction) for m > 1 . Therefore, m = 1 constitutes a critical case separat- 
ing these behaviors, called therefore subcritical (m < 1) and supercritical (m > 1). 
It is instructive to point out that, as x = 1 is always a solution of f{x) = x, Watson 
concluded, incorrectly, that the population always gets extinct, no matter the value 
of m ( ,Kendall,,1966| ). 



2.6 The probability of extinction for the binomial distribution 

For the sake of illustration we will consider a simple concrete example, a binomial 
distribution ( |Ross||20Q2[|Grinmiett and Stirzaker||20Ql| ), 

Pk = PiK = k)=(^l^p'{l-pr-', for/: = 0,...^. (27) 

This assumes that each element has only a fixed number of trials n to generate other 
elements, and any of these n trials has a constant probability p of being successful. 
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The generating function turns out to be, using the binomial theorem 

/w = £ (l) i^-pf-'p'^ = {^-p+p^r- (28) 

Let us consider the simple case with ^ = 2, and define ^ = 1 — As we know, 
the probability of extinction will come from the smallest solution in [0, 1] of 

x={q^pxf. (29) 

So, 

_ l-2pq±^{l-2pqf-Ap^q^ 
X- ' ^^^^ 



but the square root can be written as ^/Y^^Ap{\^^p) — y^^(T^-2^)^ = (1 — 2/?), and 
then, 

^^\z2E±2l±^lz2£lJ^{^- ,3,, 

Therefore, the smallest root depends on whether p is below or above 1/2 

r 1 ioxp<\ 



(32) 



As for the binomial distribution m = np = 2p ( |Ross|[20Q2| ), the critical case m = 1 
corresponds obviously to = 1/2, in agreement with the behavior of Pext- 



2.7 No stability of the population 

Although this subsection contains an interesting result to better understand the be- 
havior of the Gallon- Watson process, it can be skipped as it is not connected to the 
rest of the chapter. In fact, the iteration of the point x = shows what happens to 
the whole generating function of when ^ ^ oo. Indeed, in the same way as in 
subsection 2.5, 

lim fz, (x) = lim / (jc) = 1 if m < 1 , (33) 

whereas 

lim/z,(jc) = lim/(x) =jc* < 1 if m > 1, (34) 

except for X = 1, which always fulfills lim^^oo/^(x) = 1, see Fig.]?]). 

Note that a flat generating function corresponds to probabilities equal to zero, 
except for the zero value, i.e., 

lim P{Zt =k)=0, except for = 0. (35) 
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0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Fig. 7: Successive compositions of /(x), for all x, yielding the probability generating 
functions of Z^, starting dX t = I (lighter red) up to ^ = 15 (darker red). Larger t 
leads to flatter functions, approaching the fixed point. From left to right, subcritical, 
critical, and supercritical cases, using a binomial model with n = 2. 



In this way, for m < 1 we have that \m\t^ooP{Zt = 0) = 1, and the population gets 
extinct; but for m > 1 we have found lim^^oo/^(Z^ = 0) = x* < 1; having any other 
finite value of ^ a zero probability, this means that goes to infinite, when ^ ^ oo, 
with probability 1 — x*; that is, Zt cannot remain positive and bounded. The only 
stable state is extinction. Obviously, in this limit the Galton-Watson process is un- 
realistic, as other external factors should prevent that the population goes to infinity. 
But we do not need to bother about that, if we understand the limitations of the 
model. 



2.8 Non-equilibrium phase transition 

Let us analyze in more detail what happens around the "transition point" m = L As 
we just have seen, recall equation ( [25] ), the extinction probability is given by the 
solution of Pext = f{Pext)' When m < 1 the only solution in [0, 1] is Pext = 1 (except 
in the trivial case pi = 1). When m > 1 we have to take the smallest solution of 
Pext = f{Pext) in [0, 1]. In terms of the non-extinction probability, p = 1 —Pext^ we 
need to look for the largest p that is solution of 

/(l-p) = £/7,(l-p)^ = l-p, (36) 

in the range [0, 1]. We explore the case of Pext close to 1, for which p is close to 
zero, and, using the binomial theorem, we can expand (1— p)^ = l— /:p+ k{k — 
l)p^/2 H , which yields 
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l,Pk-Y. ^PkP + 9 L Kk-^)PkP^ + • • • = (37) 
= l-mp + -Aip^ + --- = 1-p, 

where we have introduced the mean m and the second factorial moment jl = {K(K — 
1)) (which we assume exists). Therefore, up to second order in p we need to solve 

^Aip + l-m )p :^0. (38) 



It is immediate that one solution of equation ( |38) ) is p = 0, and one can realize that 
this solution is exact up to any order in p. The other solution isp^2(m— l)//i, but 
we must pay attention to the value of /i, which can be written as /i = cr^ + m(m — 
1), with (7^ = {{K — m}^) = {K^) — m^, i.e., the variance. Existence of m and 
guarantees the existence of /i, then. Assuming ^ 0, 



2(m-l)_ 2(m-l) _2(m-l) 

II G^[l^m{m-l)/G^] G^ 



^ m(m—\) 



(39) 



(using the formula for the geometric series), therefore, p around zero means m 
around one, and we can write the second solution as 

(40, 

G^ 

which is only in the range of interest for m > 1 . 
In conclusion, we have 

p = if m < 1 

p ::^2(m- l)/(7^ if m> 1, 



(41) 



valid in the limit of small p. For m > 1 this limit is equivalent to m ^ 1 . The separate 
case O"^ = is only achieved in the trivial situation where p\ = I (otherwise, the 
mean cannot approach one). 

In this way, we obtain a behavior that is the one corresponding to a continuous 
phase transition in thermodynamic equilibrium. Identifying m with a control param- 
eter (as temperature, or more properly, the inverse of temperature) and p with an 
order parameter (as magnetization in a magnetic system) these transitions show an 
abrupt but continuous change of p as a function of m at the transition point m^, with 

p = below Mc 

p oc {m — mc)^ above but close tom^ 

For magnetic systems, rUc corresponds to the so-called Curie temperature. For the 
Galton-Watson branching process we can extract from equation ( |4T] ) that 

me = landj3 = l, (43) 
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where we assume that the variance of K does not go to zero at the transition point. 

We can compare the previous general result, p :^ 2(m — l)/(7^, for m above but 
close to 1, with the result we found for the binomial distribution with n = 2 (see 
equation ([32])), for which 



p = l 



1 



2/7-1 



(44) 



when p> 1/2. Using that in this case m = np and = npq (see 

2(m-l) _ 2p-l ^ 2p-l 
pq p^ ' 



Ross 



(20021), 



(45) 



because q = \ — p ^ p fox p ^ \ /2. So, equations ([32]) and ( |4T] ) agree close to the 
transition point. Figure [8] shows also how they disagree as m increases. 





Fig. 8: Left: non-extinction probability p as a function of the mean number of off- 
springs per element, m. Dashed line corresponds to the approximation explained 
in the text (eq. (|4T])). The abrupt change in p is the hallmark of a continuous phase 
transition. The model is binomial with n = 2. Right: the same but as a function of the 
rescaled distance to the critical point, 2(m — l)/cr^, where refers to the variance 
at m = 1. The Poisson and the geometric distributions are also studied. 



Finally, for completeness, we can play with the pathological case given by 
G^ = 0. Let us consider first the following model, pQ = \ — p\ = X\ (and zero 
otherwise), with Ai < 1. Then, m = Ai, and we know that p = 0. Next, let us con- 
sider pi = I — A2, P2 = X2 (and zero otherwise), giving m = 1 + A2. In this case, 
p = 1 always, yielding a discontinuous, or first order phase transition. 
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2.9 Distribution of the total size of the population: binomial 
distribution and rooted trees 

Our main interest will now be to calculate the total size S of the population, summing 
across all generations, i.e., 

^=£z„ (46) 

this corresponds to the total number of individuals that have ever been born, the total 
number of neutrons participating in a nuclear chain reaction, or the energy released 
during an event in an earthquake model. 

Let us go back to the concrete binomial case, 

Pk = P{K = k)=(^^p\\-pY-\ for/: = 0,...^. (47) 

The size distribution can be calculated using elementary probability and combina- 
torics. One needs to take advantage of the representation of a branching process 
as a tree (which is a connected graph with no loops). Each element is associated 
to a node, and branches linking nodes indicate an offspring relationship between 
two nodes. Naturally, all nodes have just one incoming branch, except the one cor- 
responding to the zero generation (which in this context is called the root of the 
tree). So, the number of branches is the number of nodes minus 1. As the size s 
of a tree is the number of nodes it contains, the number of branches is ^ — 1, and 
the number of missing branches (non- successful reproductive trials) i^ns — {s—\) 
(because the number of possible branches arising from s nodes is ns) ( Christensen' 
[^id Molone3^|2QQ5] ). Therefore, a particular tree of size s comes with a probability 



^(l— p)^^ ^W^^ and the probability P{S = s) of having an undefined tree of 
size s is obtained by summing for all possible trees of size s. In the case n — 2 the 
number of trees with s nodes is given by the Catalan number 

Cs = ^(^'), (48) 



1 V , 

see the Appendix for its calculation. Then, 

^(^ = ^) = ;Tt(^/)^"'(1-^)'^' with. = 1,2,... (49) 

It can be checked, using the generating function of the Catalan numbers, that this 
expression is normalized for /? < 1/2 but not for > 1/2, in fact, 

Y,P{S = s)=Pe,t, (50) 



see the Appendix again. 
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Nevertheless, the exact expression we have obtained for P{S = s) does not teach 
us anything about the behavior of this function (unless one has a great intuition about 
the behavior of the binomial coefficients). In this regard, Stirling's approximation is 
of great help ( [Christensen and Moloney | [2005 ). It states that, in the limit of large 
one can make the substitution 



Nl^VlnNl-] , (51) 



see the Appendix once more. The symbol e is nothing else than the e number. So, 
for large sizes we can apply the approximation to s and also to 2s, 



(2^)!- V4;r^( — ) . (52) 



Therefore, the binomial coefficient turns out to be, 

'2s\ {2s)l 1 {2sf' 4' 



and the Catalan number, replacing ^ + 1 ~ ^, 



(53) 



Cs = -^(^f]-^^- (54) 



1 V ^ / VnsV^' 

This is an exponential increasing function of s, and the term s^^^ does not seem to 
play any role, asymptotically. However, introducing the factor p^~^{l — pY^^, we 
go back to equation ( |49| ), getting 

p., ^ l-p [4p{l-p)Y 

"^^' = '^^7^ .3/2 • (55) 

Notice that p{l — p) is no larger than 1/4, so the exponential term becomes de- 
creasing, except for = 1/2, where it disappears. We can go one step further, by 
writing, 

[4p{l -p)Y = ^^M4p(i-P)] = ^-s/^{p) (55) 
with the characteristic size defined as 

^iP)=(^^ . ,] ^ \ (57) 



Ap{\-p) 

and finally equation ( [55] ) reads. 
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So, for s large, but substantially smaller than ^ (p) , the size probability mass func- 
tion is a power law, with exponent 3/2. For larger s, the exponential decay domi- 
nates. The exception is the critical case, = 1/2, for which ^ (p) becomes infinite, 
the exponential disappears and the distribution is a pure power law. In this case the 
exponent 3/2 is a critical exponent. The reader can see the goodness of the approxi- 
mation in Fig. [9] 




s 



Fig. 9: Probability mass functions of the total size of the population S , for different 
values of the parameter p of a. binomial distribution with n = 2, both in the subcrit- 
ical and critical cases. The asymptotic solution for large s is also shown. The pure 
power law at the critical point becomes apparent. 



Another critical exponent arises for the divergence of the characteristic size ^ {p) . 
Introducing the deviation with respect to the critical point, A = p — pc = p — 1/2, 
one can write, 

p{l-p) = \-A\ (59) 



and so, close to the critical point (for small A), 

1 1 



4p{l-p) 1-4A^ 
(using the formula of the geometric series), then 



l+4Zi^ + ... (60) 
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In --^ . ^ ln(l + 4A^) ^4A^^... (61) 

4/7(1-/7) 



(using the Taylor expansion of the logarithm at point 1) and 



1 1 



Therefore, the characteristic size ^ {p) diverges at the critical point as a power law, 
with an exponent equal to 2. This allows to write the asymptotic formula (s large) 
for the size distribution in a simpler form, close to the critical point (A small). 

Hence, after this perhaps long but worthwhile digression, we are able to say 
something about the energy distribution in Otsuka's model, which the reader will 
have already noted is a particular case of the Gallon- Watson process. If one takes 
p < 1/2 the resulting energy distribution has an exponential tail, with a character- 
istic scale given by (/?). This means that earthquakes attenuate, or get extinct, and 
in no way can dissipate energies larger than the scale provided by t,{p) (the prob- 
ability of having an earthquake of size larger than I0t^{p) is ridiculously small). 
This is the subcritical case. On the other hand, if p > 1/2 there are two types of 
earthquakes, first, those similar to the subcritical ones, with a size limited by the 
scale defined by t,{p), and second, infinite or never-ending earthquakes {Pext < 1), 
where the initial small perturbation (the toppling of just one domino piece) grows 



exponentially. This is the supercritical regime (Ben-Zion 2008). Neither the sub- 
critical nor the supercritical case are in correspondence with the Gutenberg-Richter 
law, which yields a power-law distribution of energies, and therefore the absence 
of a characteristic scale. But this is precisely what corresponds to the critical case, 
p = 1/2, which yields also a power-law distribution. Thus, the propagation of an 
earthquake through a fault is not only stochastic in the sense that when a patch fails 
one does not know what will happen next, but it is worse than that, as a critical 
process is equally likely to intensify or attenuate. Note how difficult is to achieve a 
critical behavior, as p has to be finely tuned to 1/2, otherwise criticality is lost. In 
terms of domino topplings this is what is really difficult, and not to get a full-system 
supercritical toppling, which, despite its mathematical triviality, deserves a lot of 
attention from the media when a Guinness world record is broken. 

The agreement between the model and real earthquakes is qualitative but not 
quantitative, as the model leads to a = 3/2 whereas for earthquakes ac^5/3c^l.67. 
In the next subsection we will explain that the model value of 3/2 is rather robust 
and other versions of the Gallon- Watson process lead to the same exponent. This 
discrepancy has been explored in detail by Kagan| ([2010), who argues that there 



are a series of technical artifacts that make increase the value of the exponent for 
earthquakes, and therefore, following Kagan, both exponents would be close and 
probably compatible. 
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2.10 Generating function of the total size of the population 

In order to advance further in the understanding of branching processes, our Httle 
story carries us to the U.S. during the Second World War. While soldiers were fight- 
ing in the field and civilians were suffering the horrors of war, a group of scientists 
gathered in the peace of Los Alamos, New Mexico, to do research to develop the 
first nuclear bombs. Among these brilliant people was the great Polish mathemati- 
cian Stanislaw Ulam, who was hired by his famous colleague John Von Neumann 
( |Ulam( 1991 ). Together with David Hawkins (philosopher of science and most tal- 
ented amateur mathematician ever known by Ulam) they were investigating the mul- 
tiplication of neutrons in nuclear chain reactions, using what we call now branching 
processes. It seems that they were unaware of the pioneering work of Gallon and 
Watson. 

Hawkins and Ulam showed, among other things, that the generating function 
g{x) of the total size of the population, S = Y.Mt'^t^ fulfills, in the non- supercritical 
case, 

g{x)=xf{g{x)) (64) 

where, as usual, f{x) is the generating function of the number of offsprings of an 
individual element. What follows in this subsection is based in their work for the 
Manhattan Project ( ^Hawkins and Ulam 1944[ Ulam 199Q| ), but our derivation is 



somewhat simpler. What we call total size of the population will correspond to all 
neutrons generated during the reaction. 

First, it is convenient to consider the size from generation 1 to T (excluding by 
now the zero generation). This is 

S, = j^Zt (65) 

t=\ 

with probabilities q'^^ = P{S^ = s) and a generating function g^ix) = Y.ysQs^'^^^- 
A size s in generations from 1 to T can be decomposed into a size k in the first 
generation, with probability p^, and a size ^ — ^ in the remaining T — 1 generations 

(t— 1 k) 

(from 2 to t), but starting with k elements; this has a probability ' ^ . (Note that, 
with this notation ql — q) ' .) Then, using the law of total probability. 



^i'^=Iw^!l^'''^ (66) 

k=\ 



except for s = where q^—poAf we multiply by and sum for all s, from to 
oo, we will obtain on the left hand side the generating function of S^, which turns 
out to be 



(67) 
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Fig. 10: Polish mathematician Stanislaw Ulam, together with the current version of 
the first page of Ref. Hawkins and Ulam (1944]), after being unclassified for public 
release. The work, in which important formulas for branching processes are derived, 
was done as a part of the Manhattan project. 



The term inside the square brackets is the generating function of the size from 1 to 
T — 1 generations but, instead of starting with one single element (the usual Zq = 1), 
starting with k elements (Z\ =k). As these k parents are independent of each other, 
the resulting size will be the sum of k independent random variables, each with gen- 
erating function which yields as the corresponding generating 
function, that is, 

[^,_i(x)]*= £ (68) 

Substituting into equation ( [67] ), this leads to 

It W =Po^Y. PkiST-i {x)]'x' = f{xg,_, (x)) (69) 
k=i 

where we have introduced the definition of f{x) = /^(x). 

If we want to include the zero generation in the size, we need to add an indepen- 
dent variable with generating function x (as Zq takes the value 1 with probability 1), 
and then, the generating function of the size from generation to T is the product 
g^(x) = xgT;{x). This leads to 



g^{x) =xf{g^-i{x)). 



(70) 
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Coming back to the total size, 

oo 

S=Y,Zt, (71) 

the corresponding generating function is g{x) = lim^^oogrl-^)- the probability 
of extinction is one, i.e., if the system is not supercritical, this is the same as 
lim^^oo^T-i (-^), and therefore we have 

g{x)=xf{g{x)). (72) 

So, the desired generating function is the solution of this equation, with f{x) known. 
We will not be able to solve it in general; however, notice that this is not necessary 
in order to get the moments of S. Differentiating equation ( |72| ) with respect x one 
obtains 

g'ix) = figix)) +xf'igix))g'ix), (73) 
and taking x = I and isolating, 

<^)=^'(^) = r^ = T^> (74) 

which goes to infinity as (K) =m = f'{l) goes to 1, that is, at the critical point. Of 
course, as we have mentioned, the result is not applicable in the supercritical case, 
m > 1, where the population can growth to infinite with a non-zero probability. 
Further differentiation yields higher-order moments. 
The same result could have been obtained directly, as 

(S) = (Zo+Zi+Z2 + ---) = (Zo) + (Zi) + (Z2) + -- - = l+m + m^ + - ^ 



l—m 
(75) 

where the last equality only holds in the subcritical case, otherwise, (S) goes to 
infinity. 

In a few cases, the equation for g{x) allows to easily obtain a solution. Revisiting 
the binomial example with n = 2, for which f{x) = {I— p-\- px)^, one gets 

g{x) =xf{g{x))=x{l-p^pg{x))\ (76) 

from where 



1 — 2pqx ± ^/l —4pqx 
2p^x 

with q = I— p. Using the Taylor expansion for the square root term (see the Ap- 
pendix), 

yr^4^ =l-2pqx-'£ ^ 7 {pqxY^' , (78) 
and recognizing the Catalan numbers Q there, we get (see the Appendix), 
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g{x) = ^Y.^s{pqxy^ (79) 

P s=l 

where we also realize that only the minus sign before the square root leads to a true 
generating function. Therefore, the coefficients of lead to 

P{S = s)=Csp'-'q'^\ (80) 

for ^ > 1. This result is exactly the same as the one we obtained previously in a 
different manner (see equation (|49|)), although in this way we do not need to count 
trees, as the Catalan numbers arise directly in the series expansion (in fact, we do 
not even need to know them). 

We confirm that the results for Otsuka's binomial model yield a size exponent 
equal to 3/2. But it would be desirable to test the robustness of such exponent value, 
as, after all, the model is a crude simplification of reality, and we would like that 
modifications of the model do not lead to a totally different behavior. Despite the 
difficulty to find the power-law behavior (for which we need to finely tune the pa- 
rameter p to 1/2), if one considers other models different than the binomial one, the 
asymptotic behavior of the size distribution is in general always given by a power 
law with exponent 3/2, in the critical case; this can be proved by means of Cauchy's 
formula and assuming only finite variance, see |Qtter| ( | 1 949| ) ; [Harris] ( | 1 963] ) . So, go- 



ing beyond robustness, it is common to denote such invariance as universality. 



2.11 Self-organized branching process 

At this point we are ready to accept the agreement, not only qualitative but, follow- 
ing Kagan's remarks ( |Kagan| [2Q1Q| ), also quantitative, between a critical branching 



process and earthquake occurrence. So, in order to tune the model to reality we just 
need to take p = 1/2 (in Otsuka's binomial case) or m = 1 (in general) and the 
agreement is really satisfactory, and we could finish our search for a model here. 

But we can try to go one step farther and ask: why do we find that the tectonic 
systems (and other geosystems related to natural catastrophes) are always keeping 
a delicate balance between a subcritical and a supercritical state, i.e., in an apparent 
critical state? Can the coincidence be just fortuitous? In the reproduction of indi- 
viduals one could devise an evolutionary explanation. Imagine a series of isolated 
islands, each one occupied by a population following a Gallon- Watson process but 
with different parameters for each island. It is clear that islands with subcritical 
populations get deserted after a number of generations. Populations in supercritical 
islands either get extinct also or explode exponentially, in which case we assume 
that the population collapses, due to the exhaustion of the resources (this is an in- 
gredient that is not in the original Gallon- Watson model). In the critical case, the 
population also gets extinct, but for a few of these islands the population can sur- 
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vive for very long times, much longer than in the subcritical and supercritical cases. 
So, after a long enough time we would only find critical populations. 

However, this evolutionary scenario is not applicable to a tectonic system, where, 
when the process (the earthquake) gets extinct, a new one will start sooner or later. 
Rather, the situation would be analogous to finding all magnetic materials on Earth 
at the onset of magnetization, which would mean that their temperatures would be 
equal to the Curie temperature of each material. One could suspect then that there is 
some mechanism enforcing criticality, where the temperature changes as a function 
of magnetization, and magnetization is kept at the border of the transition; in other 



words, both parameters are linked through some feedback mechanism ( [Somette 
[1992 ; Prue ssner and Peters [|2QQ6l ). 

'Zapperi et al.| (|1995') propose a model in this line. They start with a standard 
branching process but introduce some important modifications: 

• They limit the number of generations to a maximum T, so < ^ < T. 

• After the extinction of the process (which is obviously certain when the num- 
ber of generations is limited), the parameters of the process change for the next 
realization, in such a way that for subcritical cases (m < 1), the mean m of the 
number of offsprings for each individual unit increases, whereas in the supercrit- 
ical case (m > 1) the mean m decreases. The idea is to make the critical state 
m = 1 an attractor of the dynamics. 

In order to be more concrete, let us consider the usual binomial distribution with 
only 0, 1, or 2 possible offsprings and a probability p that each reproductive trial is 
successful. Then we already know that p < 1/2, p = 1/2, and p>l/2 correspond to 
the subcritical, critical, and supercritical cases, respectively. The dynamics proposed 
by Zapperi and coauthors relies on the activity that reaches the "boundary" of the 
system (defined by the last generation, t = t), which is Z^, changing the probability 
p through the following formula 

piT+l)=p{T)+ ^-^'^f^'^\ (81) 

with T a discrete time index counting the number of realizations of the process (do 
not confuse with t) and N = 2^+^ — 1 the maximum number of possible elements, 
i.e., the number of branches of the underlying complete tree. Thus, if the activity 
does not reach the boundary, Z^ is zero and the parameter p is increased by l/N, 
this is a very small number in the limit of very large systems (N oo). On the 
other hand, if the activity at the boundary is greater than one, p is decreased by 
(Z.-l)/A^. 

We already know that the expected value of Z^ is m^, with m the mean of the 
offspring distribution (m = 2pin our particular binomial model). Let us introduce a 
noise term, r] , which takes into account the fluctuations of Z^ with respect its mean, 
i.e., 7] = Z^ — m^. Obviously, by construction, (r]) =0. If we neglect, for a while, 
the noise term in equation ([ST]), the deterministic part reads. 
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p{T + 1) = F{p{T)) = p(T) + ^ (82) 
This is a discrete dynamical system, or a map, for which a fixed point /?* = F{p*) 



exists, = 1/2. Moreover, the fixed point is attractive, as < 1 ( [AUigood 

[etaL| [lW7l), due to T <C A^. 



Taking into account the value of the standard deviation of ( [Harris [ |1963| ), it 
can be shown that the noise term ri/N will have a vanishing effect in the limit of 
very large systems, and then the stochastic evolution will lead the system towards 
the deterministic fixed point, plus small random fluctuations around it. 

This spontaneous evolution of a system towards a particular organized state is 
referred to as self-organization. It is clear now that what Zapperi et al. introduced 
is a branching process that self-organizes towards a critical state. Nevertheless, the 
particular dynamics they propose seems a bit arbitrary. How can this kind of global 
control be implemented in a real system, where we expect the interactions between 
elements to be purely local? 



2.12 Self-organized criticality and sandpile models 



In fact, the self-organized branching process introduced by [Zapperi et al.| ( [1995 ) 
was naturally embedded in the previous notion of self-organized criticality (SOC), 
invented by Bak and coworkers in the 1980's (| B^ [T996| | Jensen| p^98l [Chris- 
[tensen and Moloney) [2Q05| ). Although it is not relevant for our story, it is worth 
to state that these authors were not interested in (because they were not aware of) 
the problem of power-law distributions in natural hazards ( [Bak[ [1996]); rather, they 
were mainly concerned to similar-in- spirit problems in condensed-matter physics, 
as charge density waves and one-over-f noise, as well as to the emergence of frac- 
tal spatial structures elsewhere ( [Bak et aL) [1987| ). The fact that earthquakes (and 
other hazards) were a manifestation of self-organized criticality was a fortunate by- 
product, pointed by l lto and Matsuzaki ( |1990| ), [Somette and Somette ( tl989 ), and 
[Bak and Tmig (1989) shortly after the introduction of the SOC concept, see also the 
review of Main[fl996| ). Nowadays, natural hazards are one of the main applications 
of SOC, despite the original lack of attention by |Bak et al. ( 1987 ). As we have seen 
through this chapter, ignorance seems a common characteristic of science evolution. 

The metaphor used by Bak in order to illustrate his ideas was that of a pile of sand 
( Bak[[1996] ). We have to recognize that the sandpile we are going to consider is a bit 
esoteric; in fact, there is a clear correspondence between the model and a pile only 
in one dimension (the one-dimensional model corresponds to a pile constrained in 
two dimensions, between two parallel plates ([Christensen et al.[[l996!)). But instead 



of keeping close to reality, it is more effective to deal with a mean-field sandpile; 
this is achieved either in a system defined in the limit of infinite dimensions or 
in a system in which each element has "random neighbors", and neglecting the 
correlations between the elements. Notice that Bak and colleagues make use of a 
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new concept, not present in the branching processes already explained: the notion 
of complexity, understood here as the nontrivial interaction between many units or 
agents, which will result in an emergent collective behavior that is different than the 



sum of the behavior of the individual parts ( ,Newman[ |2011 



So, consider a system consisting in a large number of elements, such that each 
element can store a certain number of discrete packages (or particles), but when this 
limit is surpassed the packages are released to other elements - the neighbors. The 
situation is analogous to what happens in a Ministry office. Each bureaucrat has a 
series of documents or papers (the packages) at his/her desk, but when the number of 
those is too big, he/she decides to do something about it and transfers some papers 
to some other (random) bureaucrats, and so on (Bak 1996| ). This simple behavior 
will lead to interesting dynamics, unexpectedly. 

To be specific, let us consider that each element can store at most one package; 
if some extra package arrives to it, the element releases two packages to some other 
units, taken randomly (either among all other elements, what defines random neigh- 
bors or among the 2d nearest neighbors in a J— dimensional square lattice). If, after 
the release, the number of packages is still greater than one (which may happen if 
the element received more than one package) the release process is repeated. All 
the elements evolve following a parallel updating of their dynamics, i.e., there is a 
common clock setting the time t of all elements. In a formula. 



[ Zi Zi - 2, 



(83) 



where Zi counts the number of packages of element / and n{i) denotes two of its 
neighbors. 

Obviously, this process can give rise to an avalanche in the transference of pack- 
ages, which only stops when all elements have no more than one package. In that 
case, the system is perturbed by the addition of one extra package to a randomly 
chosen element, and the dynamics starts again. This defines a new time scale, de- 
noted by T (in the same way as in the previous subsection). So, 

ifzi<\yi^ Zj ^ Zj^l, (84) 

where j denotes a randomly selected unit. The system also releases packages outside 
(or to the garbage can, in the bureaucrats picture); in a J— dimensional lattice this 
happens when a boundary element selects as a neighbor an external element; in a 
fully random-neighbor system this happen just with a small predefined probability 
for each element. This simple variation of the original sandpile model of |Bak et aT 



( [1987| ) (changing the topology of the system by means of a different selection of 
neighbors) can be viewed also as a mean-field version of the so-called Manna model 
( |Manna|p"991[|Christensen and Moloney! |2005| ). 



The simple rules of the model make that the total number of packages in the 
system, M, evolves, from the addition of one package to the next, accordingly to 



M(r + 1) = M(r) + l-drop(r), 



(85) 
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where drop is the number of packages that are expelled from the system. The key 
parameter of this model is p, defined, for each element, as the probability that its 
number of packages is equal to one (so they are at the onset of instability). But in a 
mean field description all elements are uncorrelated and equivalent, so we can define 
a generic p for the whole system, verifying p = M/N, with N the total number of 
elements. So, there is a probability p that an element releases two packages when it 
receives one. The action of release is what constitutes the generation of an offspring, 
which is the element that relaxes. Therefore, dividing equation ( [85] ) by N we obtain 



piT+l)=piT)+'-^^, (86) 



which we can recognize as essentially equation ( [ST] ), the one introduced by |Zappen 



et al. ( 1995 ) in the self-organized branching process. We have already realized that 
this equation provides a feedback mechanism of the number of packages into the 
toppling (branching) probability (early identifications of this obvious feedback in 



SOC were written by |Kadanoffl ( [T991| ) and |Sorneltel ( [T992l )). 



Both in the limit of an infinite dimension lattice or in a fully random neighbor 
system one realizes that the evolution of an avalanche corresponds to a set of prop- 
agating non-interacting packages (as the probability that the activity comes back to 
an element is vanishingly small), and therefore the activity evolves as a branching 
process. But note that the tree associated to the branching process does not corre- 
spond to a quenched underlying structure of the system, as the random neighbors 
are selected dynamically, at each time step. The limit T in the number of generations 
introduced by Zapperi and coauthors needs to be added as an extra ingredient in the 
model, enforcing the dissipation of packages to take place at the T time step. In sum- 
mary, this illustrates the correspondence between the mean-field limit of sandpile 
models and branching processes. This is enough for our purposes. Other chapters 
in this book illustrate in much more detail the dynamics of sandpiles. Nevertheless, 
it is worth mentioning that the first connection between SOC and critical branching 
process was published by | Alstr0m| ( | 1 98 8), where it was assumed, however, that the 



system was in a critical state from the beginning. Notably, much before, Vere-Jones 



(1976]) had proposed a branching model very similar to Otsuka's (but, as usual, 



unaware of it) and realized that the tectonic system should evolve spontaneously 
towards criticality. Also, very recently, [Hergarten| pO 1 2} has introduced a variation 
of Zapperi et al.'s branching model that evolves only with local rules. 

Recapitulating, self-organized criticality offers a coherent framework for the un- 
derstanding of earthquakes and many other natural hazards mentioned in the first 
section. Indeed, both phenomena (SOC and earthquakes) show a highly non-linear 
response, where a small and slow perturbation or driving (the addition of grains, 
or the stress provided by the motion of the tectonic plates) pumps energy into the 
system, which, due to the presence of local thresholds stores that energy, until at 
some point some threshold is surpassed. The resulting release of energy propagates 
locally, which can trigger further surpassings of thresholds, generating a chain re- 
action or avalanche. One key point is that the energy released in such a way has to 
be power-law distributed, so the system responds in all possible scales. Notice also 
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that the dynamics shows a time- scale separation, as the avalanches happen infinitely 
fast compared with the driving (the toppling of grains is stopped during the propa- 
gation of an avalanche). Moreover, |Main| ( p"996| ) mentions additional characteristics 
of seismicity present in SOC models, namely, stress drops that are small in compar- 
ison with the regional tectonic stress field and the existence of seismicity induced or 
triggered by relatively small stress perturbations. All this makes SOC a very plau- 
sible mechanism for earthquakes. The connection is made still more concrete using 
variations of the sandpile models that mimic the behavior of the spring-block model 
of |Burr idge and Kno poff] ( pW/] ) as the so-called OFC model ( |Qlami et allp^92l ). 
See also Main (1996). 

However, as far as we know, the authentic hallmark of SOC, the existence of an 
underlying second-order (continuous) phase transition, has not been found in earth- 
quakes. The very nature of SOC makes almost impossible to identify such an abrupt 
change of an order parameter when a control parameter changes (because the control 
parameter is attracted towards the critical point). Nevertheless, this elusive behavior 
has been found in a different system: rainfall ([Peters and Neelin[ [2006]), thanks to 
very large fluctuations from criticality; so, if a control and an order parameter could 
be measured and if similarly large fluctuations were exist, one would finally prove 
the existence of SOC in earthquakes. 

The same reasoning applies to other natural hazards, for which, at least, sandpile- 
like models are abundant in the literature, and their classification as SOC systems 
is plausible ( Jensen 1998[ ). The case of hurricanes is still not clear ([Corral 2010[) 
whereas for tsunamis we can state that their power-law distribution ( Burroughs and 
Tebbens[[2005[ ) does not arise from a SOC mechanism, as they are not slowly driven 
(rather, they are violently driven by earthquakes, landslides and meteorite impacts). 

Finally, it is worth mentioning that there is another connection between branch- 
ing processes and earthquakes. Instead of using the branching to model the prop- 
agation of individual earthquakes, it is used for the way in which one earthquake 
triggers other earthquakes, i.e., aftershocks, following the so-called Omori law. The 
most representative model of this kind is the epidemic-type aftershock- sequences 
(ETAS) model ( [Ogata| [1999| [Helmstetter and Sornette[ [2002[ ). Interestingly, the 
evolution model of Bak and Sneppen ( [1993 ) (another paradigm of SOC) can be 
interpreted to reproduce the statistics of earthquakes from this (slow) time scale 
( ItQ[[1995 ]). This perspective opened a whole new line in statistical seismology, but 
this is a different story ( [Bak et al.[[2002[[Corral]|2004a|b[ ). 



3 Conclusions 

We started this chapter showing some remarkable statistical properties of earthquake 
occurrence, and ended up mingling with infinite-dimensional sandpiles models for 
self-organized criticality. In between, we learnt a few things about branching pro- 
cesses. Now we sketch some consequences for our initial object of study: natural 
hazards. 
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First, besides any model, we can say a few things just by looking at the data: 
earthquakes and other natural hazards follow a power-law distribution of sizes, in 
some cases with an exponential cutoff due to finite-size effects (the Earth is finite, af- 
ter all!). For the particular values of the exponents found, this implies that, although 
big events are less likely, they are always the main contributors of the overall dev- 
astation. As financial data of asset returns and other social and technological data 
have also been reported to follow power law distributions ( [Mantegna and Stanley| 
1999[|Newman[[20Q5] ), one wonders what the points in common with these systems 



and natural hazards can be. 

Regarding Otsuka's rupture model, we showed how, by using a fairly simple 
stochastic cascade setup for the local dynamics of fault patches and the mathe- 
matical formalism for branching processes, one can reproduce the global statistical 
properties of real earthquake occurrences (and other natural hazards). This is quite 
remarkable, as it constitutes a link between two distinct observational scales: the 
micro-scale of local dynamics, and the macro-scale of global statistical behavior. 

But Otsuka's model is a particular case of the Galton-Watson branching pro- 
cess. So, first, we presented in an easy way the main results already known for such 
processes (main results in relation to our interests). We explained how the machin- 
ery of probability generating functions allows to find a formula for the activity (or 
population) at any generation of the process. In the limit of infinite generations, 
one gets the probability of extinction, which shows an abrupt change between two 
different regimes: extinction for sure if the mean number of offsprings is below 
or equal to one, and the possibility of non-extinction in the opposite case. Further 
progress leads to an expression for the probability of the total size of the process (the 
total population ever born or the total energy radiated by an earthquake). It is pre- 
cisely at the border of the two mentioned cases, at the critical point of the transition, 
that one finds a behavior compatible with earthquakes and other natural hazards. A 
power-law distribution with exponent 3/2 emerges in this case; however, it remained 
unexplained how the Earth should drive itself towards such a critical state. 

In this regard, we showed how, by using a simple feedback mechanism, one can 
turn the critical point into an attractor of the model. A global condition, related 
with boundary dissipation, acts on the probability of activation, in such a way that 
when this probability is low, it increases, and vice versa when it is high. Idealized 
sandpile models in the mean-field limit implement in a natural way this mechanism, 
by means of the transport of particles through the system up to the boundaries where 
they are dissipated. The content of particles regulates the activity in the system. 

It is worth mentioning that going beyond the mean-field limit and turn to lattice 
(more realistic) systems makes things terribly complicated, and the researcher has 
to rely more and more on computer simulations and losses the guide of exact, or at 
least approximated analytical treatments. But this makes the mathematical problems 
that these systems pose much more interesting and exciting. For sure, researchers 
will devote their efforts to them for decades. 

As a final point, we have to recognize that criticality and self-organized critical- 
ity are not the only ways to generate power-law distributions. In fact, much sim- 



pler processes that yield power laws exist, as reviewed in |Sornette| ( |2004| ); |Mitzen- 
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macher (2004); Newman ( 2005| ). A well known mechanism that escapes from the 



normal-distribution attractor in diffusion processes is provided by anomalous diffu- 
sion ([Bouch aud and Georges', "1990), and its relation with sandpiles was studied by 



Boguna and Corral ( 1997 ), among others. Nevertheless, we believe the present work 



has clearly shown the plausibility of self-organized criticality for the explanation of 
earthquakes and natural hazards in general. A complementary, even more complex 
perspective is provided by|Ben-Zioii|(|2008|. 



Appendix 

Properties of power-law distributions 

Some facts about the power-law distribution are remarkable. Let us consider the 
probability density D{E) oc \/E^, defined between Eynin and oo. We may first calcu- 
late its mean, i.e., the expected value of given by 



{E)= / ED{E)dE. 



(87) 



It is easy to check that, when a < 2 (i.e. /? < 3/2), this integral becomes infinite, 
so, mathematicians would state the expected value of the energy does not exist, 
whereas physicists would say that that value is infinite. We take the second option, 
which is more informative as we are aware of what we are dealing with. Of course, 
the average energy radiated by an earthquake cannot be infinite (the Earth contains 
a finite amount of energy), so there is a problem extrapolating the power law up to 
infinity. With a normal distribution or with an exponential distribution (for example) 
we would not have such a problem of extrapolation, but it is worth to realize that 
this is a physical problem, not a mathematical problem - for instance, if instead 
of energy we were talking about time between some events, the mean time could 
perfectly be "infinite". Then, for physical reasons, there has to be an upper limit 
for the validity of the Gutenberg-Richter law; however, we have no idea about how 
large that limit should be. In practice, the fact that the mean energy becomes infinite 
means that the average energy one might calculate from a series of data does not 



converge, no matter the number of data. Figure 1 1 illustrates this fact for the case 
of mean seismic moment, which is considered to be proportional to radiated energy. 
Summarizing, seismologists are totally ignorant about the mean energy radiated by 
earthquakes, due to the special properties of power-law distributions. 

Although previously we interpreted as good news the fact that most earthquakes 
are of small size and only very few of them are devastating, the situation is certainly 
not so favorable. The reason is that the rare big events, despite their scarcity, are the 
ones responsible for the dissipation of energy in the system. For the particular value 
of a we are dealing with, it is easy to check that the largest order of magnitude 
considered in the energy (the largest decade, or scale) contributes to the total budget 
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Fig. 11: Mean seismic moment for worldwide shallow earthquakes with seismic 
moment greater that 10^^ Nm, using the CMT catalog, starting in 1980. This yields 
a total of 3363 events. Note that the mean value does not converge. The big jump at 
the end of 2004 is caused by the great Sumatra- Andaman earthquake. The radiated 
energy should lead to the same behavior. 



more than all the other scales below. In mathematical terms, 



no matter how big is c (see next subsection for details). 

A second peculiar property of power laws is scale invariance. Let us introduce 
the concept of scale transformation, considering an arbitrary function that we call 
D{E). The idea of a scale transformation is to look at the function /)(£") at a different 
scale, as for instance, using a mathematical microscope. We can have a view of the 
function at the scale of meters (if E and D{E) were distances) and try to see how it 
looks at the scale of centimeters. This is performed through a scale transformation, 
denoted by an operator T acting on the function Z)(£), as 



where c\ and C2 are two constants called scale parameters, performing a linear trans- 
formation on E and D. In the case of the meters-centimeters example, c\=C2 = 100. 

In general, almost every function changes under a scale transformation; the ex- 
ception can be found looking for the function or functions that verify the following 
condition. 




(88) 



T[D{E)]=C2D{E/ci), 



(89) 



D{E)=C2D{E/ci). 



(90) 



34 



Alvaro Corral and Francesc Font-Clos 



It is trivial to check that a solution is given by the power-law function 



with a given by 



lnc2 
Inci ' 



a -- 



(91) 



(92) 



in other words, a power law with exponent a does not change under a scale trans- 
formation if the scale factors are related through 



1 



Figure 12 shows how indeed this is the case, with ci = 10, C2 = vTO, and D(E) = 
\fE. Note that the constant of proportionality in equation ( [9T] ), contained in the 
symbol oc, does not play any role here. 

More importantly, it can also be demonstrated that not only the power law is 
a solution, but it is the only solution valid for all values of c\ (positive real) if c\ 



and C2 are related by equation ( [93] ) ( [Takayasu] |1989[ |Newman| |2005[ [Christensen 
and Moloneyl |20Q5[ |Corral[ |2Q08| ). In summary, the condition of scale invariance 
demands that 

D{E) = C2D{E/ci) for all ci positive real, (94) 

and then, the only solution is the power law. One can verify that other solutions, as 
D{E) = sin(ln£^), only work for special values of ci and C2. 




Fig. 12: A scale transformation acting on its corresponding scale-invariant function. 
The function is expanded by factors ci = 10 and C2 = aAO, in such a way that the 
small box at the left is the full figure at the right. The function is D{E) = \fE. 



Scale invariance is in fact the symmetry associated to scale transformations, in an 
analogous way as rotational invariance is the symmetry corresponding to rotations. 
If scale invariance is fulfilled, no characteristic scale can be defined for the variable 
E, in the same way as if there is rotational invariance in a system, this system cannot 
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be used to point at a particular direction (a compass cannot be built from a ball). 
Systems do not displaying scale invariance allow to define characteristic scales, as 
the exponential functions defining radioactive decay lead to the definition of the unit 
of time in terms of the half -life. 

There is, nevertheless, an important point to be taken into account here. If D{E) 
represents a probability density (as it is the case for the energy radiated by earth- 
quakes), then, D{E) cannot be a power law for all E >0, because it could not be 
normalized (its integral from to oo would diverge). We have already mentioned 
that it is necessary to introduce a lower cutoff Emin in order to avoid this fact. Also, 
sometimes the power law cannot be extended to infinity, for physical reasons. So, 
complete scale invariance is not possible for probability distributions, and one can 
have only a restricted scale invariance. However, in the case of earthquakes, as both 
the lower limit and the upper limit are not available from observations, scale invari- 
ance plays a genuine role. 

Scale invariance in the energy of earthquakes has some counter-intuitive con- 
sequences. Imagine that you arrive at a new country, and you are worried about 
earthquakes, and ask the people there the following question: how big are typically 
earthquakes here? Despite the innocence of such a simple question, due to scale 
invariance no characteristic scale for the energy can be defined and the question has 
no possible answer. 



Dissipation of energy in the largest scales 

Let us consider a (continuous) power-law distribution, defined, for simplicity, be- 
tween 1 and oo, with probability density, 

J^iE)^^. (95) 

We are going to see that, for a given r > 2 there exist values of a such that the 
contribution to the expected value of E from an interval 1 < < c is always smaller 
than the contribution from c <E <rc,no matter how big c is. 

The contribution of an interval a < £" < c to the mean value of E is 

f ED{E)dE oc (96) 

Ja 

Therefore, 

ED(E)dE oc c^-« - 1 , (97) 

and 

ED{E)dE oc c^-«(r^-« - 1). (98) 
In order that the last integral is larger than the previous one it is enough that 
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(r2-^-l)c2-^>c2-^. (99) 
So, r^~^ > 2 and this implies that 

a<2-log,2. (100) 

For r = 10, the (sufficient) condition becomes a < 1.699. In the case of earthquake 
radiated energy, a I -\-2b/3 1.667, and equation ( |100| ) is fulfilled. Though, 
slightly larger values of a violate the condition; nevertheless, there is nothing spe- 
cial in taking r = 10 (it is not a magical number!) and we have that equation ( |100| ) 
is fulfilled for a larger r. For r = 2 equation ( |100| ) would imply a < 1, but this is 
not an acceptable exponent for a power-law distribution (normalization would not 
be fulfilled). 



Rigorous proof of extinction probability 

Besides graphical arguments (see Fig. |6]), we want to provide a rigorous proof for 
the computation of the extinction probability in the Gallon- Watson process, given 
by 

Pe,, = \imf'{0), (101) 

where Pext is properly defined only if the limit exists. To see that this is always 
the case, we note that = =^ Z^+i = 0. Hence, {Z^ = 0} C {Z^+i = 0} and 
P{Zt = 0) < P{Zt^i = 0), so f{0) < (0) or, in words, (/) is a non-decreasing 
sequence. As /([0, 1]) C [0, 1], we conclude that f{0) is bounded and has a limit. 
To continue our proof, let us treat separately the two cases m< 1 , m > 1 . Hence, 



case m< I: 



As f{x) is non-convex for x > 0, it always lies above any straight line tangent to it 
( [Spivak||l967 ). In particular, we consider the line tangent to f{x) at the point (1,1), 



and 

f{x) > l+m(x-l) >jc. (102) 
Hence f{x) > x for < x < 1. Also, it is straightforward to see that f{Pext) = Pext, 

/(lim/(0)) = lim/(/(0)) = lim/+i(0) = Hm/(0), (103) 

and of course < Pext < 1. So we have that f{Pext) = Pext with < Pext < 1. Sum- 
marizing, Pext is a fixed point of f{x) in the interval [0, 1], but f{x) > x (strictly) in 
[0, 1). It is clear that the only option left is Pext = 1. 
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case m > 1 : 

We will start showing that Pext 7^ 1 in this case. First, as already said, {f) is a non- 
decreasing sequence. Second, as f{x) is continuous and /'(I) = m > 1, we have that 
f{x) < X for X G (1 - e, 1) for some e > 0. So, /(O) ^ (1 - e, 1) for all t (because 
it would then decrease). This means that the only way for f(G) to have limit 1 is 
to "jump over" the interval (1 — e, 1), that is, by means of some y <l—e such that 
f{y) = 1. But such y cannot exist because then f{x) < at some point between y 
and 1. 

Now we will see that the equation /(x*) = x* has a unique solution in the interval 
[0, 1). There must be at least one solution because /(O) > 0, and f{x) < x in (1 — 
e, 1) (here we are using Bolzano's theorem for f{x) —x). To see that this solution is 
unique, suppose there are two solutions, < xi < X2 < 1. As we also have /(!) = !, 
by Rolle's theorem there would exist two points , J2 such that f{yi)=f{y2) = ^ 
and xi <yi < X2 <y2 < ^, but this is impossible because f'{x) > in [0, 1], which 
means that f{x) is non-decreasing and hence takes any value only once in [0, 1]. 

So, if Pext ^ 1 but f{Pext) = Pext, then Pext must be the unique solution of /(x*) = 
jc* in [0,1). 

For the sake of rigor, we must point out that some "pathological" cases would 
need a separate treatment, such as f{x) = x, but those are almost never of actual 
interest. 



Catalan numbers 

The Catalan numbers owe their name not to a Mediterranean region but to the 
French-Belgian mathematician from the 19th century Eugene Charles Catalan. 
"His" numbers count a large variety of objects ( |Stanley[ |1999| ), in particular, the 
rooted trees that arise in the study of branching process when the number of off- 
springs can be 0, 1, or 2. We can consider a tree of size s as the root (corresponding 
to the zero generation of the associated branching process) plus the remaining ^ — 1 
nodes, these latter can be distributed as a varying number of nodes associated to 
the first branch, 0, 1 , ... 5- — 1 and the rest to the second branch, 5'— 1,^ — 2,... 1,0, 
respectively. Therefore, the number of trees Q of size s fulfills. 



where Cq is taken equal to one, as there is only one way in which a branch can have 
no elements. Note that from here we obtain 



Cs = CqCs-i + CiQ_2 H h + Q-iCo 



(104) 



Ci = {Cof = 1 
C2 = 2CoCi = 2 
C3=2CoC2 + (Ci)2 = 5 
C4 =2C3Co + 2C2Ci = 14 



(105) 
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and so on this simple formula generates all Catalan numbers. The curious reader 

o 

Ci = \ 



I \ 



C2 = 2 



A 



C3=5 




C4 = U 

Fig. 13: The number of rooted trees with no more than two branches per node is 
shown, up to size ^ = 4. The number of such trees of a given size is given by Q, the 
^-th Catalan number. 



can check Figure 13 where all possible rooted trees with no more than two branches 
per node, of size up to 4, are shown. 

If we want a closed expression for these numbers, we may define a generating 
function 

h{x) = Co + Cix + C2X^ H = 

^=0 



(106) 



One can obtain an expression for h{x) just using the properties of the Catalan num- 
bers ( |Wilf||1994| i. First, let us calculate 
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ij=0 



I 

5=0 



I CiCj 

i-\-j=s 



"5=0 



Q+1 



As we know that Co = 1, we end up with a quadratic equation for h{x), namely, 

x[h{x)f-h{x) + l=0, (107) 
which allows us to isolate h{x), 

IzbvT^^ 



h{x) ■ 



2x 



(108) 



One of both functions (depending on the =b sign) is then the generating function of 
the Catalan numbers. We are going to recover these numbers from its generating 
function. First, one needs the Taylor expansion of Vl —x around x = 0, which is 



Vl-x=l 



X Ix^ 3x^ 



1 



2 4 2! 8 3! '2 ^-^2'^^{s^l)\ 

where, remember, nil = n{n — 2) - - -l, and so. 



iJ^^^\ (109) 



(110) 



Then, substituting in h{x), one can realize that only the minus sign can correspond 
to a generating function, and 

Mx) = l + lf ^'^"^^"'^"' x-+' = l + f ^'^"^^"'V (111) 
2x^, is+l)l '+^^1 (^+1)! ' 

from where we obtain a first expression for the Catalan numbers. 



Cs 



(2^-l)!!2^ 



for5> 1. 



(.+ 1)! 

A more comfortable formula can be obtained using that 

(2s)! = {2s)U{2s-l)\\=s\2'{2s- 1)!!, 

and then one finds, 

Q 



1 f2s 



s!(s+i)! s+1 V y ' 



(112) 



(113) 



(114) 
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the standard expression for the Catalan numbers, now valid for all s>0. 



Normalization and non-normalization of the total size distribution 



We are going to illustrate how the total size probability distribution, P{S = s),is only 
normalized in the subcritical and critical cases. We use the binomial distribution for 
the distribution of the number of offsprings, with ^ = 0, 1 and 2. From the main text, 
we know that 

P(S = s) = ^ (^')p'~\^- pY^^ with 5 = 1 , 2, . . . (115) 



^+1 V 



It can be checked, using the generating function of the Catalan numbers, that this 
expression is normalized for < 1/2 but not for > 1/2. In order to see this, let us 
first consider the generating function of the Catalan numbers, derived in the previous 
subsection of the Appendix, 

Kx) = £c,x-= ^~^f^ . (116) 

s=0 

Then, introducing q=\—p, 

oo oo 

E = ^) = ! E = \ {Kpq) - 1) , (117) 

S=\ P 5=1 " 

and using the expression for h{x). 



2pq 2pq 2pq 

We can distinguish two cases, first, p < 1/2, for which, 

Mm)-1 = ^-1 = ^ = ^^, (119) 

q q max(/?,^) 

and for the opposite case, p> 1/2, 

/.(M)-l = --l = ^ = ^^^. (120) 
p p max{p,q) 



Therefore, 



q min{p,q) j 1 forp<l/2 
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Remembering the results for the extinction probability for the binomial distribution, 

oo 

'£P{S = s)=Pe,t, (122) 

which obviously is not normalized for > 1/2. We could also have arrived to the 
same result using, not the generating function of the Catalan numbers, but the gen- 
erating function g{x) of the size S. 



Stirling's Approximation 

Usually, Stirling's formula is demonstrated by means of the Euler-Maclaurin for- 
mula. However, if one knows some elementary properties of the gamma distribution, 
Stirling's formula arises almost spontaneously, by means of a probabilistic trick. 

Remember that the factorial is associated to the gamma function, nl =r{n-\-l), 
which is defined as 

poo 

r(r) = / yy-'e-ydy (123) 

Jo 

for 7 > (Abramowitz and Stegun[ |1965 ). This allows to introduce the gamma 



distribution ( |Durrett 2010 (, with probability density given by 



^ y^-^e-y (124) 



r(7) 



for J > (and zero otherwise), and with mean 7 and variance 7. 

It turns out that the gamma distribution arises as a sum of a number 7 of indepen- 
dent exponential random variables, each with density e'^ (this can be easily demon- 
strated through successive convolutions of the exponentials, see Durrett|pQlQ| )). But 



using the central limit theorem, the gamma distribution will converge, in the limit 



7^ 00, to a normal distribution (see Fig. 14), with mean fi and standard deviation 



(7 (in this case the notation is different to the rest of the chapter). 

Then, it will be possible to transform the gamma function into a Gaussian inte- 
gral. Indeed, 



: r{n+l) = [fe-ydy ^ C j^xp f"^^^^] dy. 



(125) 



The key point is to find the value of C for which both functions overlap. This hap- 
pens around the mean or the mode of both distributions, corresponding, respectively, 
toy = Y=n-\-l and y = ji. Substituting both values in 

fe-y=Ctxp(-^^^^^\ (126) 
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Fig. 14: Approaching of the normal distribution by the gamma distribution, adding 
8 and 100 exponentials, respectively. The central limit theorem allows the derivation 
of Stirling's approximation. 



we get 



C 



and therefore, looking for the normal probability density inside the integral. 



n\=r{n+\) -> VlnaC 



[ J— exp(- 
Jo vlna \ 



2a2 



dy. 



(127) 



(128) 



The value of a is obtained from = Y = n-\-l (for independent random variables 
the variance of a sum is the sum of variances, which is one for each exponential 
distribution in our sum). Substituting, and replacing the lower integration limit by 
— oo, due to the fact that the standard deviation a ^Jn is much smaller than the 
mean ji c^n, one obtains, 

n\^V27m(^-j , (129) 

valid, remember, in the limit n ^ oo. This proof has some parts in common with the 
more elaborated one of |Khan|(l974| ) and less resemblance with that of | van den Berg 
( [T995] ). 
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